This file will eventually become your project report for P02: Exploratory Data Analysis. Specifically, you will write rmarkdown to report your exploratory data analysis.
Please see Canvas for more details.
# Example 1: Note relative path, which can be read: Up one
# directory(..), down into source (/source), and
# then "source" an R file (data_access.R)
source("../source/data_access.R")
data_access_test()
## [1] "Hello: World!"
# Example 1: This function was "sourced" above
msg <- data_access_test(" Morgan!")
Hello: Morgan! Hope you have a good day!!
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## Warning: package 'tidyverse' was built under R version 4.2.2
## ── Attaching packages
## ───────────────────────────────────────
## tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.5
## ✔ tibble 3.1.8 ✔ stringr 1.4.1
## ✔ tidyr 1.2.1 ✔ forcats 0.5.2
## ✔ readr 2.1.3
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ tidyr::complete() masks RCurl::complete()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## total completed missing completion_rate
## 1 10199 4707 5492 0.4615
## # A tibble: 2 × 3
## # Groups: Drug status [2]
## `Drug status` number_base percentage_base
## <int> <int> <dbl>
## 1 0 9599 94.1
## 2 1 600 5.88
## # A tibble: 2 × 4
## # Groups: Drug status [2]
## `Drug status` number_follow percentage_follow whole_percentage
## <int> <int> <dbl> <dbl>
## 1 0 4488 95.4 44
## 2 1 219 4.65 2.15
## # A tibble: 2 × 6
## # Groups: Drug status [2]
## `Drug status` number_base percentage_base number_follow percentage_f…¹ whole…²
## <int> <int> <dbl> <int> <dbl> <dbl>
## 1 0 9599 94.1 4488 95.4 44
## 2 1 600 5.88 219 4.65 2.15
## # … with abbreviated variable names ¹percentage_follow, ²whole_percentage
## # A tibble: 7 × 3
## # Groups: lottery status [7]
## `lottery status` number_base percentage_base
## <int> <int> <dbl>
## 1 0 441 4.32
## 2 1 1344 13.2
## 3 2 1726 16.9
## 4 3 2341 23.0
## 5 4 2904 28.5
## 6 5 1222 12.0
## 7 6 221 2.17
## # A tibble: 7 × 4
## # Groups: lottery status [7]
## `lottery status` number_follow percentage_follow whole_percentage
## <int> <int> <dbl> <dbl>
## 1 0 257 5.46 2.52
## 2 1 837 17.8 8.21
## 3 2 587 12.5 5.76
## 4 3 958 20.4 9.39
## 5 4 1323 28.1 13.0
## 6 5 604 12.8 5.92
## 7 6 141 3 1.38
## # A tibble: 7 × 6
## # Groups: lottery status [7]
## `lottery status` number_base percentage_base number_follow percentag…¹ whole…²
## <int> <int> <dbl> <int> <dbl> <dbl>
## 1 0 441 4.32 257 5.46 2.52
## 2 1 1344 13.2 837 17.8 8.21
## 3 2 1726 16.9 587 12.5 5.76
## 4 3 2341 23.0 958 20.4 9.39
## 5 4 2904 28.5 1323 28.1 13.0
## 6 5 1222 12.0 604 12.8 5.92
## 7 6 221 2.17 141 3 1.38
## # … with abbreviated variable names ¹percentage_follow, ²whole_percentage
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.
## Warning: package 'plotly' was built under R version 4.2.2
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
In order to determine whether gambling tendencies are correlated with drug and alcohol usage, we decided to make a chart to see the relationship between gambling tendencies and drug usage. This chart is color encoded so the top portion of the stacked bar chart are people who always used drugs while they gambled, and the bottom portion of the stacked bar chart is people who never used drugs while they gambled. There seems to be a correlation between people who gamble more and drug usage. People with no debt from gambling, or those who gambled the least out of those surveyed, reported a significantly less amount of drug usage while they gambled, while the people who had more debt from gambling reported a larger drug usage.
In order to determine whether personality has an effect on compulsive gamblers, we decided to see if there is a relationship between gambling tendencies and impulsiveness. The measure for impulsiveness was taken from the NEO Personality Index. For the gambling survey, researchers put the “Impulsiveness” measure section of the NEO index to measure a person’s impulsiveness to see if there is a correlation between impulsiveness as a personality trait and tendencies to gamble. The chart above shows the amount of self-reported debt that each gambler has as well as their impulsiveness measured on a scale of 0 points to 32 points, with 32 being the most impulsive and 0 being the least impulsive. This chart shows that while there maybe some small correlations between impulsiveness and tendencies to gamble, the correlation was likely small. The average peak of the frequencies of people in each gambling category was in a relatively similar impulsiveness scale, and even those who had more debt still didn’t report a significantly higher number on their scores for impulsiveness.
This chart shows Alcohol usage vs gambler debt. (We are a 3 person group, so this is just an extra graph for fun.)